Unsupervised document summarization from data reconstruction perspective
نویسندگان
چکیده
Due to its wide applications in information retrieval, document summarization is attracting increasing attention in natural language processing. A large body of recent literature has implemented document summarization by extracting sentences that cover the main topics of a document with a minimum redundancy. In this paper, we take a different perspective from data reconstruction and propose a novel unsupervised framework named Document Summarization based on Data Reconstruction (DSDR). Specifically, our approach generates a summary which consist of those sentences that can best reconstruct the original document. To model the relationship among sentences, we firstly introduce the linear reconstruction which approximates the document by linear combinations of the selected sentences. We then extend it into the non-negative reconstruction which allows only additive, not subtractive, linear combinations. In order to handle the nonlinear cases and respect the geometrical structure of sentence space, we also extend the linear reconstruction in the manifold adaptive kernel space which incorporates the manifold structure by using graph Laplacian. Extensive experiments on summarization benchmark data sets demonstrate that our proposed framework outperform state of the art. & 2015 Published by Elsevier B.V.
منابع مشابه
Cascaded Attention based Unsupervised Information Distillation for Compressive Summarization
When people recall and digest what they have read for writing summaries, the important content is more likely to attract their attention. Inspired by this observation, we propose a cascaded attention based unsupervised model to estimate the salience information from the text for compressive multi-document summarization. The attention weights are learned automatically by an unsupervised data rec...
متن کاملMulti-Document Summarization using Distributed Bag-of-Words Model
As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since it can provide the main ideas in a document set in short time. In this paper, we present an unsupervised centroid-based document-level reconstruction framework using distributed bag of words model. Specifically, our approach selects summary sentences in order to...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملAn Unsupervised Multi-Document Summarization Framework Based on Neural Document Model
In the age of information exploding, multi-document summarization is attracting particular attention for the ability to help people get the main ideas in a short time. Traditional extractive methods simply treat the document set as a group of sentences while ignoring the global semantics of the documents. Meanwhile, neural document model is effective on representing the semantic content of docu...
متن کاملSalience Estimation via Variational Auto-Encoders for Multi-Document Summarization
We propose a new unsupervised sentence salience framework for Multi-Document Summarization (MDS), which can be divided into two components: latent semantic modeling and salience estimation. For latent semantic modeling, a neural generative model called Variational Auto-Encoders (VAEs) is employed to describe the observed sentences and the corresponding latent semantic representations. Neural va...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neurocomputing
دوره 157 شماره
صفحات -
تاریخ انتشار 2015